55 research outputs found

    An Information-Theoretic Analysis of Deduplication

    Full text link
    Deduplication finds and removes long-range data duplicates. It is commonly used in cloud and enterprise server settings and has been successfully applied to primary, backup, and archival storage. Despite its practical importance as a source-coding technique, its analysis from the point of view of information theory is missing. This paper provides such an information-theoretic analysis of data deduplication. It introduces a new source model adapted to the deduplication setting. It formalizes the two standard fixed-length and variable-length deduplication schemes, and it introduces a novel multi-chunk deduplication scheme. It then provides an analysis of these three deduplication variants, emphasizing the importance of boundary synchronization between source blocks and deduplication chunks. In particular, under fairly mild assumptions, the proposed multi-chunk deduplication scheme is shown to be order optimal.Comment: 27 page

    The Approximate Capacity of the Gaussian N-Relay Diamond Network

    Full text link
    We consider the Gaussian "diamond" or parallel relay network, in which a source node transmits a message to a destination node with the help of N relays. Even for the symmetric setting, in which the channel gains to the relays are identical and the channel gains from the relays are identical, the capacity of this channel is unknown in general. The best known capacity approximation is up to an additive gap of order N bits and up to a multiplicative gap of order N^2, with both gaps independent of the channel gains. In this paper, we approximate the capacity of the symmetric Gaussian N-relay diamond network up to an additive gap of 1.8 bits and up to a multiplicative gap of a factor 14. Both gaps are independent of the channel gains and, unlike the best previously known result, are also independent of the number of relays N in the network. Achievability is based on bursty amplify-and-forward, showing that this simple scheme is uniformly approximately optimal, both in the low-rate as well as in the high-rate regimes. The upper bound on capacity is based on a careful evaluation of the cut-set bound. We also present approximation results for the asymmetric Gaussian N-relay diamond network. In particular, we show that bursty amplify-and-forward combined with optimal relay selection achieves a rate within a factor O(log^4(N)) of capacity with pre-constant in the order notation independent of the channel gains.Comment: 23 pages, to appear in IEEE Transactions on Information Theor

    Tracking Stopping Times Through Noisy Observations

    Full text link
    A novel quickest detection setting is proposed which is a generalization of the well-known Bayesian change-point detection model. Suppose \{(X_i,Y_i)\}_{i\geq 1} is a sequence of pairs of random variables, and that S is a stopping time with respect to \{X_i\}_{i\geq 1}. The problem is to find a stopping time T with respect to \{Y_i\}_{i\geq 1} that optimally tracks S, in the sense that T minimizes the expected reaction delay E(T-S)^+, while keeping the false-alarm probability P(T<S) below a given threshold \alpha \in [0,1]. This problem formulation applies in several areas, such as in communication, detection, forecasting, and quality control. Our results relate to the situation where the X_i's and Y_i's take values in finite alphabets and where S is bounded by some positive integer \kappa. By using elementary methods based on the analysis of the tree structure of stopping times, we exhibit an algorithm that computes the optimal average reaction delays for all \alpha \in [0,1], and constructs the associated optimal stopping times T. Under certain conditions on \{(X_i,Y_i)\}_{i\geq 1} and S, the algorithm running time is polynomial in \kappa.Comment: 19 pages, 4 figures, to appear in IEEE Transactions on Information Theor

    Fundamental Limits of Caching

    Full text link
    Caching is a technique to reduce peak traffic rates by prefetching popular content into memories at the end users. Conventionally, these memories are used to deliver requested content in part from a locally cached copy rather than through the network. The gain offered by this approach, which we term local caching gain, depends on the local cache size (i.e, the memory available at each individual user). In this paper, we introduce and exploit a second, global, caching gain not utilized by conventional caching schemes. This gain depends on the aggregate global cache size (i.e., the cumulative memory available at all users), even though there is no cooperation among the users. To evaluate and isolate these two gains, we introduce an information-theoretic formulation of the caching problem focusing on its basic structure. For this setting, we propose a novel coded caching scheme that exploits both local and global caching gains, leading to a multiplicative improvement in the peak rate compared to previously known schemes. In particular, the improvement can be on the order of the number of users in the network. Moreover, we argue that the performance of the proposed scheme is within a constant factor of the information-theoretic optimum for all values of the problem parameters.Comment: To appear in IEEE Transactions on Information Theor

    Energy-Efficient Communication over the Unsynchronized Gaussian Diamond Network

    Full text link
    Communication networks are often designed and analyzed assuming tight synchronization among nodes. However, in applications that require communication in the energy-efficient regime of low signal-to-noise ratios, establishing tight synchronization among nodes in the network can result in a significant energy overhead. Motivated by a recent result showing that near-optimal energy efficiency can be achieved over the AWGN channel without requiring tight synchronization, we consider the question of whether the potential gains of cooperative communication can be achieved in the absence of synchronization. We focus on the symmetric Gaussian diamond network and establish that cooperative-communication gains are indeed feasible even with unsynchronized nodes. More precisely, we show that the capacity per unit energy of the unsynchronized symmetric Gaussian diamond network is within a constant factor of the capacity per unit energy of the corresponding synchronized network. To this end, we propose a distributed relaying scheme that does not require tight synchronization but nevertheless achieves most of the energy gains of coherent combining.Comment: 20 pages, 4 figures, submitted to IEEE Transactions on Information Theory, presented at IEEE ISIT 201

    Computation Alignment: Capacity Approximation without Noise Accumulation

    Full text link
    Consider several source nodes communicating across a wireless network to a destination node with the help of several layers of relay nodes. Recent work by Avestimehr et al. has approximated the capacity of this network up to an additive gap. The communication scheme achieving this capacity approximation is based on compress-and-forward, resulting in noise accumulation as the messages traverse the network. As a consequence, the approximation gap increases linearly with the network depth. This paper develops a computation alignment strategy that can approach the capacity of a class of layered, time-varying wireless relay networks up to an approximation gap that is independent of the network depth. This strategy is based on the compute-and-forward framework, which enables relays to decode deterministic functions of the transmitted messages. Alone, compute-and-forward is insufficient to approach the capacity as it incurs a penalty for approximating the wireless channel with complex-valued coefficients by a channel with integer coefficients. Here, this penalty is circumvented by carefully matching channel realizations across time slots to create integer-valued effective channels that are well-suited to compute-and-forward. Unlike prior constant gap results, the approximation gap obtained in this paper also depends closely on the fading statistics, which are assumed to be i.i.d. Rayleigh.Comment: 36 pages, to appear in IEEE Transactions on Information Theor

    Decentralized Coded Caching Attains Order-Optimal Memory-Rate Tradeoff

    Full text link
    Replicating or caching popular content in memories distributed across the network is a technique to reduce peak network loads. Conventionally, the main performance gain of this caching was thought to result from making part of the requested data available closer to end users. Instead, we recently showed that a much more significant gain can be achieved by using caches to create coded-multicasting opportunities, even for users with different demands, through coding across data streams. These coded-multicasting opportunities are enabled by careful content overlap at the various caches in the network, created by a central coordinating server. In many scenarios, such a central coordinating server may not be available, raising the question if this multicasting gain can still be achieved in a more decentralized setting. In this paper, we propose an efficient caching scheme, in which the content placement is performed in a decentralized manner. In other words, no coordination is required for the content placement. Despite this lack of coordination, the proposed scheme is nevertheless able to create coded-multicasting opportunities and achieves a rate close to the optimal centralized scheme.Comment: To appear in IEEE/ACM Transactions on Networkin
    • …
    corecore